Admin

Load packages, utility functions, global variables

source("~scripts/00 - Admin.R")
source("~scripts/01 - Utility Functions.R")

Explore OSM data

Resources

Vignette: https://cran.r-project.org/web/packages/osmdata/vignettes/osmdata.html

Map features: https://wiki.openstreetmap.org/wiki/Map_Features

?available_features() and ?available_tags([feature])

Baltimore

Query parameters

List of parameters:

  • City name
  • key
  • value
Balt_bbox <- getbb("Baltimore") %>% 
  opq()

Transit

Query the bus stops in each city.

Tags used:

  • public_transport=*
source("~scripts/12 - Read Transit SDG data.R")
## 
Downloading: 16 kB     
Downloading: 16 kB     
Downloading: 16 kB     
Downloading: 16 kB     
Downloading: 25 kB     
Downloading: 25 kB     
Downloading: 25 kB     
Downloading: 25 kB     
Downloading: 41 kB     
Downloading: 41 kB     
Downloading: 49 kB     
Downloading: 49 kB     
Downloading: 65 kB     
Downloading: 65 kB     
Downloading: 65 kB     
Downloading: 65 kB     
Downloading: 65 kB     
Downloading: 65 kB     
Downloading: 81 kB     
Downloading: 81 kB     
Downloading: 81 kB     
Downloading: 81 kB     
Downloading: 90 kB     
Downloading: 90 kB     
Downloading: 110 kB     
Downloading: 110 kB     
Downloading: 110 kB     
Downloading: 110 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 140 kB     
Downloading: 140 kB     
Downloading: 150 kB     
Downloading: 150 kB     
Downloading: 160 kB     
Downloading: 160 kB     
Downloading: 160 kB     
Downloading: 160 kB     
Downloading: 180 kB     
Downloading: 180 kB     
Downloading: 180 kB     
Downloading: 180 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 210 kB     
Downloading: 210 kB     
Downloading: 230 kB     
Downloading: 230 kB     
Downloading: 230 kB     
Downloading: 230 kB     
Downloading: 240 kB     
Downloading: 240 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 280 kB     
Downloading: 280 kB     
Downloading: 280 kB     
Downloading: 280 kB     
Downloading: 300 kB     
Downloading: 300 kB     
Downloading: 320 kB     
Downloading: 320 kB     
Downloading: 330 kB     
Downloading: 330 kB     
Downloading: 350 kB     
Downloading: 350 kB     
Downloading: 350 kB     
Downloading: 350 kB     
Downloading: 25 kB     
Downloading: 25 kB     
Downloading: 25 kB     
Downloading: 25 kB     
Downloading: 49 kB     
Downloading: 49 kB     
Downloading: 49 kB     
Downloading: 49 kB     
Downloading: 49 kB     
Downloading: 49 kB     
Downloading: 65 kB     
Downloading: 65 kB     
Downloading: 65 kB     
Downloading: 65 kB     
Downloading: 65 kB     
Downloading: 65 kB     
Downloading: 81 kB     
Downloading: 81 kB     
Downloading: 81 kB     
Downloading: 81 kB     
Downloading: 90 kB     
Downloading: 90 kB     
Downloading: 96 kB     
Downloading: 96 kB     
Downloading: 96 kB     
Downloading: 96 kB     
Downloading: 96 kB     
Downloading: 96 kB     
Downloading: 110 kB     
Downloading: 110 kB     
Downloading: 120 kB     
Downloading: 120 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 150 kB     
Downloading: 150 kB     
Downloading: 150 kB     
Downloading: 150 kB     
Downloading: 150 kB     
Downloading: 150 kB     
Downloading: 160 kB     
Downloading: 160 kB     
Downloading: 160 kB     
Downloading: 160 kB     
Downloading: 180 kB     
Downloading: 180 kB     
Downloading: 180 kB     
Downloading: 180 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 190 kB     
Downloading: 220 kB     
Downloading: 220 kB     
Downloading: 230 kB     
Downloading: 230 kB     
Downloading: 240 kB     
Downloading: 240 kB     
Downloading: 240 kB     
Downloading: 240 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 260 kB     
Downloading: 260 kB     
Downloading: 270 kB     
Downloading: 270 kB     
Downloading: 280 kB     
Downloading: 280 kB     
Downloading: 280 kB     
Downloading: 280 kB     
Downloading: 290 kB     
Downloading: 290 kB     
Downloading: 300 kB     
Downloading: 300 kB     
Downloading: 300 kB     
Downloading: 300 kB     
Downloading: 310 kB     
Downloading: 310 kB     
Downloading: 310 kB     
Downloading: 310 kB     
Downloading: 320 kB     
Downloading: 320 kB     
Downloading: 330 kB     
Downloading: 330 kB     
Downloading: 330 kB     
Downloading: 330 kB     
Downloading: 360 kB     
Downloading: 360 kB     
Downloading: 360 kB     
Downloading: 360 kB     
Downloading: 370 kB     
Downloading: 370 kB     
Downloading: 370 kB     
Downloading: 370 kB     
Downloading: 390 kB     
Downloading: 390 kB     
Downloading: 400 kB     
Downloading: 400 kB     
Downloading: 400 kB     
Downloading: 400 kB     
Downloading: 410 kB     
Downloading: 410 kB     
Downloading: 410 kB     
Downloading: 410 kB
# Balt_busStops <- Balt_bbox %>% 
#   add_osm_feature(key = "highway", value = "bus_stop") %>% 
#   osmdata_sf() %>% 
#   # keep only the points. Note that the query returned 4333 points, 9 polygons, and 1 multi-line feature
#   .$osm_points

Make quick maps using only the point data

# source("~scripts/30 - Read basemaps.R") 
sdg_basemaps <- readRDS("~objects/30/30_sdg_basemaps.rds")
ggmap(Balt_map) +
  geom_sf(data = Balt_busStops,
          inherit.aes = FALSE, # this is necessary
          alpha = 0.5) +
  labs(title = "Bus Stops in Baltimore",
       caption = "Data source: OSM",
       x = "",
       y = "")
Maps of each city
Baltimore

Minneapolis

New Orleans

Query some historical data

API admin

base_url <- "https://api.ohsome.org/v0.9"
api_metadata <- GET(url= paste(base_url, "/metadata", sep = "")) %>% 
  content("text") %>% 
  fromJSON()

The below tells us the database contains data from October 8, 2007 to May 23, 2020.

api_metadata$extractRegion$temporalExtent
## $fromTimestamp
## [1] "2007-10-08T00:00:00Z"
## 
## $toTimestamp
## [1] "2020-06-17T03:00:00Z"

Aggregation endpoint

Use this endpoint to aggregate OSM data, e.g. counts, areas, lengths, and users (contributors).

Let’s look at the trends for mapping bus stops in Baltimore.

# api_bbox <- getbb("Baltimore") %>% bbox_to_string() # note that the order of the coords is flipped from what the database needs
balt_bbox <- "-76.770759, 39.1308816,-76.450759,39.4508816"
# api_bbox <- "8.6581,49.3836,8.7225,49.4363"

api_keys <- "highway"
api_values <- "bus_stop"

monthly <- "2007-11-01/2020-05-23/P1M"

api_data <- GET(url = paste(base_url, "/elements/count", sep = ""),
                query = list(
                  bboxes = balt_bbox,
                  keys = api_keys,
                  values = api_values,
                  time = monthly))

busStops_hist <- content(api_data, as = "text") %>% 
  fromJSON() %>% 
  .$result

The query from osmdata showed 4333 bus stops in Baltimore currently. The OSHDB query shows 4241 bus stops as of May 1, 2020.

ggplot(busStops_hist, 
       aes(x = as.Date(timestamp),
           y = value)) +
  geom_line() +
  theme_bw() +
  labs(title = "Bus Stops in Baltimore Mapped In OSM Over Time",
       caption = "Source: OSHDB, ohsome API",
       x = "Date",
       y = "Count")

Extraction endpoint

Use this endpoint to pull historical snapshots of OSM features.

The column names for the resulting dataframe seem odd - is there a better way to convert the API geoJSON response into sf?

# is there a more elegant way to do the below?
extraction_api_data <- GET(url= paste(base_url, "/elements/geometry", sep = ""),
                query = list(
                  bboxes = balt_bbox,
                  keys = api_keys,
                  values = api_values,
                  types = "point", # do I want to limit it to points only?
                  time = monthly))
busStops_geom_hist <- read_sf(extraction_api_data)
busStops_geom_hist$X.snapshotTimestamp <- as.Date(busStops_geom_hist$X.snapshotTimestamp)

busStops_geom_hist
## Simple feature collection with 84352 features and 3 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -76.77029 ymin: 39.13266 xmax: -76.45081 ymax: 39.44865
## geographic CRS: WGS 84
## # A tibble: 84,352 x 4
##    X.osmId         X.snapshotTimestamp highway             geometry
##    <chr>           <date>              <chr>            <POINT [°]>
##  1 node/1806310072 2013-04-01          bus_stop (-76.46212 39.3776)
##  2 node/1806310072 2013-05-01          bus_stop (-76.46212 39.3776)
##  3 node/1806310072 2013-06-01          bus_stop (-76.46212 39.3776)
##  4 node/1806310072 2013-07-01          bus_stop (-76.46212 39.3776)
##  5 node/1806310072 2013-08-01          bus_stop (-76.46212 39.3776)
##  6 node/1806310072 2013-09-01          bus_stop (-76.46212 39.3776)
##  7 node/1806310072 2013-10-01          bus_stop (-76.46212 39.3776)
##  8 node/1806310072 2013-11-01          bus_stop (-76.46212 39.3776)
##  9 node/1806310072 2013-12-01          bus_stop (-76.46212 39.3776)
## 10 node/1806310072 2014-01-01          bus_stop (-76.46212 39.3776)
## # ... with 84,342 more rows
# busStops_geom_hist <- content(extraction_api_data, as = "text") %>%
#   fromJSON()

# write(busStops_geom_hist %>% toJSON(), "test.json")

According to our aggregated data, there was a massive spike in bus stops in Baltimore on OSM at the beginning of February, 2019: from 280 to 3968.

First, does the geometry data have the same number of observations as the aggregated data? It’s very close - it may be a matter of the time of day queried.

busStops_changeMap <- busStops_geom_hist %>% 
  filter(X.snapshotTimestamp %in% as.Date(c("2019-02-01", "2019-01-01")))

busStops_changeMap %>% group_by(X.snapshotTimestamp) %>% 
  summarize(count = n())
## Simple feature collection with 2 features and 2 fields
## geometry type:  MULTIPOINT
## dimension:      XY
## bbox:           xmin: -76.77029 ymin: 39.13266 xmax: -76.45081 ymax: 39.44865
## geographic CRS: WGS 84
## # A tibble: 2 x 3
##   X.snapshotTimesta~ count                                              geometry
##   <date>             <int>                                      <MULTIPOINT [°]>
## 1 2019-01-01           272 ((-76.7639 39.18362), (-76.76289 39.31459), (-76.761~
## 2 2019-02-01          3960 ((-76.77029 39.41797), (-76.77025 39.41795), (-76.77~

Next, compare to two months side-by-side on a map.

ggmap(sdg_basemaps$Baltimore) +
  geom_sf(data = busStops_changeMap,
          inherit.aes = FALSE, 
          alpha = 0.5) +
  labs(title = "Change in Bus Stops Mapped on OSM in Baltimore",
       subtitle = "Number of bus stops jumped from 280 in January 2019 to 3968 in February 2019.",
       caption = "Data source: OSHDB",
       x = "",
       y = "") +
  facet_wrap(~ X.snapshotTimestamp) 

Tutorial

This tutorial is a guide to using R to query OpenStreetMap and the US Census to track progress made on two UN Sustainable Development Goals.

  • List the two SDGs, why they’re important, how they’re measured, briefly
  • Talk about APIs used
  • Challenges (OSM data inconsistent across locales, and dependent on strength of local mapping community)

Pulling Census Data for Baltimore

For this tutorial, we’ll be looking at Baltimore, Maryland. Our first step is to download data from the US Census Bureau for the city. These data include two types of information that are important for our analysis:

  1. Geography: Choosing a geographic unit of analysis is an important step in any spatial analysis. Patterns that might be present at one unit of geography might not be present for a different unit. For example, we might see that a city has plentiful parks and open space if we look at the city as a whole, but

  2. Demographic and Socio-economic data:

These data will include demographic and socio-economic information that we’re interesting in looking at for the SDG analysis, but it also

Transit SDG

We’ll start with the